99 research outputs found

    Blind Unmixing Using A Double Deep Image Prior

    Get PDF
    In this paper, we propose a novel network structure to solve the blind hyperspectral unmixing problem using a double Deep Image Prior (DIP). In particular, the blind unmixing problem involves two sub-problems: endmember estimation and abundance estimation. We, therefore, propose two sub-networks, endmember estimation DIP (EDIP) and abundance estimation DIP (ADIP), to generate the estimation of endmembers and estimation of corresponding abundances respectively. The overall network is then constructed by assembling these two sub-networks. The network is trained in an end-to-end manner by minimizing a novel composite loss function. The experiments on synthetic and real datasets show the effectiveness of the proposed method over state-of-art unmixing methods

    An ADMM Based Network for Hyperspectral Unmixing Tasks

    Get PDF
    In this paper, we use algorithm unrolling approaches in order to design a new neural network structure applicable to hyperspectral unmixing challenges. In particular, building upon a constrained sparse regression formulation of the underlying unmixing problem, we unroll an ADMM solver onto a neural network architecture that can be used to deliver the abundances of different (known) endmembers given a reflectance spectrum. Our proposed network – which can be readily trained using standard supervised learning procedures – is shown to possess a richer structure consisting of various skip connections and shortcuts than other competing architectures. Moreover, our proposed network also delivers state-of-the-art unmixing performance compared to competing methods

    Toward Minimal-Sufficiency in Regression Tasks: An Approach Based on a Variational Estimation Bottleneck

    Get PDF
    We propose a new variational estimation bottleneck based on a mean-squared error metric to aid regression tasks. In particular, this bottleneck - which draws inspiration from a variational information bottleneck for classification counterparts - consists of two components: (1) one captures the notion of Vr -sufficiency that quantifies the ability for an estimator in some class of estimators Vr to infer the quantity of interest; (2) the other component appears to capture a notion of Vr - minimality that quantifies the ability of the estimator to generalize to new data. We demonstrate how to train this bottleneck for regression problems. We also conduct various experiments in image denoising and deraining applications showcasing that our proposed approach can lead to neural network regressors offering better performance without suffering from overfitting

    Regression with Deep Neural Networks: Generalization Error Guarantees, Learning Algorithms, and Regularizers

    Get PDF
    We present new data-dependent characterizations of the generalization capability of deep neural networks based data representations within the context of regression tasks. In particular, we propose new generalization error bounds that depend on various elements associated with the learning problem such as the complexity of the data space, the cardinality of the training set, and the input-output Jacobian of the deep neural network. Moreover, building upon our bounds, we propose new regularization strategies constraining the network Lipschitz properties through norms of the network gradient. Experimental results show that our newly proposed regularization techniques can deliver state-of-the-art performance in comparison to established weight-based regularization

    On Neural Networks Fitting, Compression, and Generalization Behavior via Information-Bottleneck-like Approaches

    Get PDF
    It is well-known that a neural network learning process—along with its connections to fitting, compression, and generalization—is not yet well understood. In this paper, we propose a novel approach to capturing such neural network dynamics using information-bottleneck-type techniques, involving the replacement of mutual information measures (which are notoriously difficult to estimate in high-dimensional spaces) by other more tractable ones, including (1) the minimum mean-squared error associated with the reconstruction of the network input data from some intermediate network representation and (2) the cross-entropy associated with a certain class label given some network representation. We then conducted an empirical study in order to ascertain how different network models, network learning algorithms, and datasets may affect the learning dynamics. Our experiments show that our proposed approach appears to be more reliable in comparison with classical information bottleneck ones in capturing network dynamics during both the training and testing phases. Our experiments also reveal that the fitting and compression phases exist regardless of the choice of activation function. Additionally, our findings suggest that model architectures, training algorithms, and datasets that lead to better generalization tend to exhibit more pronounced fitting and compression phases

    Optimization Guarantees for ISTA and ADMM Based Unfolded Networks

    Get PDF
    Recently, unfolding techniques have been widely utilized to solve the inverse problems in various applications. In this paper, we study optimization guarantees for two popular unfolded networks, i.e., unfolded networks derived from iterative soft thresholding algorithms (ISTA) and derived from Alternating Direction Method of Multipliers (ADMM). Our guarantees–leveraging the Polyak-Lojasiewicz* (PL*) condition–state that the training (empirical) loss decreases to zero with the increase in the number of gradient descent epochs provided that the number of training samples is less than some threshold that depends on various quantities underlying the desired information processing task. Our guarantees also show that this threshold is larger for unfolded ISTA in comparison to unfolded ADMM, suggesting that there are certain regimes of number of training samples where the training error of unfolded ADMM does not converge to zero whereas the training error of unfolded ISTA does. A number of numerical results are provided backing up our theoretical findings

    REST: Robust lEarned Shrinkage-Thresholding Network Taming Inverse Problems with Model Mismatch

    Get PDF
    We consider compressive sensing problems with model mismatch where one wishes to recover a sparse high-dimensional vector from low-dimensional observations subject to uncertainty in the measurement operator. In particular, we design a new robust deep neural network architecture by applying algorithm unfolding techniques to a robust version of the underlying recovery problem. Our proposed network –named Robust lErned Shrinkage-Thresholding (REST) –exhibits additional features including enlarged number of parameters and normalization processing compared to state-of-the-art deep architecture Learned Iterative Shrinkage-Thresholding Algorithm (LISTA), leading to the reliable recovery of the signal under sample-wise varying model mismatch. Our proposed network is also shown to outperform LISTA in compressive sensing problems under sample-wise varying model mismatch

    Tighter Expected Generalization Error Bounds via Convexity of Information Measures

    Get PDF
    Generalization error bounds are essential to understanding machine learning algorithms. This paper presents novel expected generalization error upper bounds based on the average joint distribution between the output hypothesis and each input training sample. Multiple generalization error upper bounds based on different information measures are provided, including Wasserstein distance, total variation distance, KL divergence, and Jensen-Shannon divergence. Due to the convexity of the information measures, the proposed bounds in terms of Wasserstein distance and total variation distance are shown to be tighter than their counterparts based on individual samples in the literature. An example is provided to demonstrate the tightness of the proposed generalization error bounds

    Neural network-based classification of X-ray fluorescence spectra of artists' pigments: an approach leveraging a synthetic dataset created using the fundamental parameters method

    Get PDF
    X-ray fluorescence (XRF) spectroscopy is an analytical technique used to identify chemical elements that has found widespread use in the cultural heritage sector to characterise artists' materials including the pigments in paintings. It generates a spectrum with characteristic emission lines relating to the elements present, which is interpreted by an expert to understand the materials therein. Convolutional neural networks (CNNs) are an effective method for automating such classification tasks—an increasingly important feature as XRF datasets continue to grow in size—but they require large libraries that capture the natural variation of each class for training. As an alternative to having to acquire such a large library of XRF spectra of artists' materials a physical model, the Fundamental Parameters (FP) method, was used to generate a synthetic dataset of XRF spectra representative of pigments typically encountered in Renaissance paintings that could then be used to train a neural network. The synthetic spectra generated—modelled as single layers of individual pigments—had characteristic element lines closely matching those found in real XRF spectra. However, as the method did not incorporate effects from the X-ray source, the synthetic spectra lacked the continuum and Rayleigh and Compton scatter peaks. Nevertheless, the network trained on the synthetic dataset achieved 100% accuracy when tested on synthetic XRF data. Whilst this initial network only attained 55% accuracy when tested on real XRF spectra obtained from reference samples, applying transfer learning using a small quantity of such real XRF spectra increased the accuracy to 96%. Due to these promising results, the network was also tested on select data acquired during macro XRF (MA-XRF) scanning of a painting to challenge the model with noisier spectra Although only tested on spectra from relatively simple paint passages, the results obtained suggest that the FP method can be used to create accurate synthetic XRF spectra of individual artists' pigments, free from X-ray tube effects, on which a classification model could be trained for application to real XRF data and that the method has potential to be extended to deal with more complex paint mixtures and stratigraphies

    An Information-theoretical Approach to Semi-supervised Learning under Covariate-shift

    Get PDF
    A common assumption in semi-supervised learning is that the labeled, unlabeled, and test data are drawn from the same distribution. However, this assumption is not satisfied in many applications. In many scenarios, the data is collected sequentially (e.g., healthcare) and the distribution of the data may change over time often exhibiting so-called covariate shifts. In this paper, we propose an approach for semi-supervised learning algorithms that is capable of addressing this issue. Our framework also recovers some popular methods, including entropy minimization and pseudo-labeling. We provide new information-theoretical based generalization error upper bounds inspired by our novel framework. Our bounds are applicable to both general semi-supervised learning and the covariate-shift scenario. Finally, we show numerically that our method outperforms previous approaches proposed for semi-supervised learning under the covariate shift.Comment: Accepted at AISTATS 202
    • …
    corecore